Skip to content

feat(llmo): CloudFront CDN log delivery with assume-role setup (LLMO-5566)#2680

Open
claudiaboldis wants to merge 6 commits into
mainfrom
feature/LLMO-5566-cloudfront-log-delivery-assume-role
Open

feat(llmo): CloudFront CDN log delivery with assume-role setup (LLMO-5566)#2680
claudiaboldis wants to merge 6 commits into
mainfrom
feature/LLMO-5566-cloudfront-log-delivery-assume-role

Conversation

@claudiaboldis

Copy link
Copy Markdown

Summary

  • Adds server-side STS AssumeRole (connector role) so the LLMO UI CloudFront wizard never needs AWS credentials in the browser — all CloudFront and CloudWatch Logs calls are performed by the Lambda
  • Adds 5 read-only wizard endpoints (connect, distributions, prerequisites, origins, behaviors) that power the step-by-step CloudFront BYOCDN setup flow
  • Adds POST /sites/:siteId/llmo/cdn-log-delivery (diagram step 8) — idempotently creates a CloudWatch Logs delivery source in the customer's account and links it to Adobe's cross-account delivery destination, enabling CloudFront standard log push to the cdn-logs S3 bucket

New files

File Purpose
src/support/edge-optimize.js assumeConnectorRole (STS), listCloudFrontDistributions, getDistributionConfig, idempotent step-on-poll deploy orchestrator
src/support/cdn-log-delivery.js Provider-agnostic log-delivery registry; CloudFront path uses logs:PutDeliverySource + logs:CreateDelivery via the assumed role

Endpoints (all POST, all INTERNAL_ROUTES — admin/IMS-only)

  • POST /sites/:siteId/llmo/edge-optimize/connect
  • POST /sites/:siteId/llmo/edge-optimize/distributions
  • POST /sites/:siteId/llmo/edge-optimize/prerequisites
  • POST /sites/:siteId/llmo/edge-optimize/origins
  • POST /sites/:siteId/llmo/edge-optimize/behaviors
  • POST /sites/:siteId/llmo/cdn-log-delivery

⚠️ Cross-service dependencies (not in this PR — must land before prod)

  1. Connector-role IAM policy (customer-bootstrap-role.yaml in EDGE_OPTIMIZE_TEMPLATE_BUCKET) must add logs:PutDeliverySource, logs:CreateDelivery, logs:GetDeliverySource, logs:DescribeDeliveries — today the role only grants CloudFront perms
  2. spacecat-auth-service must provision the cdn-logs-<org> delivery destination + cross-account policy for the BYOCDN CloudFront path (the handler surfaces a clear "destination not provisioned" 400 if it's missing)

Test plan

# verify connector role assumable
curl -s -X POST "https://spacecat.experiencecloud.live/sites/$SITE_ID/llmo/edge-optimize/connect" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"accountId":"123456789012","externalId":"<externalId>"}'
# Expected: HTTP 200 { "connected": true }
# list distributions via assumed role
curl -s -X POST "https://spacecat.experiencecloud.live/sites/$SITE_ID/llmo/edge-optimize/distributions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"accountId":"123456789012","externalId":"<externalId>"}'
# Expected: HTTP 200 { "distributions": [...] }
# enable log delivery (idempotent)
curl -s -X POST "https://spacecat.experiencecloud.live/sites/$SITE_ID/llmo/cdn-log-delivery" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"accountId":"123456789012","externalId":"<externalId>","distributionId":"EDFDVBD6EXAMPLE"}'
# Expected: HTTP 200 { "alreadyExisted": false, "deliveryId": "..." }
# Re-run: HTTP 200 { "alreadyExisted": true }

Related

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions

Copy link
Copy Markdown

This PR will trigger a minor release when merged.

claudiaboldis pushed a commit that referenced this pull request Jun 26, 2026
…d, CF templates, setupType, rescan endpoint

- gateEdgeOptimizeWizard now derives externalId = toSafeAwsName(imsOrgId) server-side; FE no longer sends externalId
- getEdgeOptimizeBootstrapUrl accepts setupType ('log-only'|'log-and-oae') and selects the matching CF template key
- Add customer-bootstrap-role.yaml and customer-bootstrap-role-log-only.yaml CloudFormation templates
- Add POST /sites/:siteId/llmo/cdn-log-rescan (rescanCdnLogDelivery) for idempotent re-enabling of log delivery
- Remove 14 stale 'externalId missing' tests; add 9 tests for rescanCdnLogDelivery

Introduced by: #2680
claudiaboldis pushed a commit that referenced this pull request Jun 26, 2026
claudiaboldis pushed a commit that referenced this pull request Jun 26, 2026
…MO-5566)

- Remove dead !Organization early-return from gateEdgeOptimizeWizard
- Add test for custom EDGE_OPTIMIZE_ROLE_NAME env var path
- Add test for rejection with no .message (covers 'unknown error' fallback)

Introduced by: #2680
claudiaboldis pushed a commit that referenced this pull request Jun 26, 2026
…strapUrl (LLMO-5566)

- Add test for log-only setupType without EDGE_OPTIMIZE_TEMPLATE_KEY_LOG_ONLY (default fallback)
- Add test for missing IMS org ID in the bootstrap URL inline org check

Introduced by: #2680
@claudiaboldis claudiaboldis requested a review from ABHA61 June 26, 2026 22:09
main #2682 shipped the CloudFront onboarding wizard (LlmoCloudFrontController,
cdn-onboard/cloudfront/*) while this branch was open, making the branch's parallel
edge-optimize/* wizard redundant. Reset onto main and rebuilt around the only net-new
slice: CloudWatch log delivery.

- Add enableCdnLogDelivery + rescanCdnLogDelivery on LlmoCloudFrontController under
  cdn-onboard/cloudfront/log-delivery and .../log-rescan
- Assume-role externalId uses main's client-supplied per-session UUID
  (validateCloudfrontCredentials); imsOrgId for the org-scoped delivery destination is
  resolved server-side from the site's organization
- Reuse tokowaka CloudFrontEdgeClient.listDistributions for the rescan (no local
  edge-optimize support module needed)
- Add src/support/cdn-log-delivery.js (CloudWatch Logs delivery control plane) and
  @aws-sdk/client-cloudwatch-logs; OpenAPI + capability/route entries; tests

Introduced by: #2680
@claudiaboldis claudiaboldis force-pushed the feature/LLMO-5566-cloudfront-log-delivery-assume-role branch from f3bbb27 to 8cc66b0 Compare June 30, 2026 11:50
@ABHA61 ABHA61 requested a review from MysticatBot June 30, 2026 12:11

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @claudiaboldis,

Verdict: Request changes - two issues to address before merge (concurrency cap and HTTP status correctness).
Complexity: HIGH - large diff; API surface + new dependency.
Changes: Adds CDN log delivery endpoints (single-distribution and bulk rescan) using STS AssumeRole to create CloudWatch Logs delivery sources in customer accounts (13 files).
Note: Recommend a human read before merge - this change modifies the OpenAPI contract (docs/openapi/). The bot review is a complement to, not a replacement for, a human read here.

Must fix before merge

  1. [Important] Unbounded concurrency in rescanCdnLogDelivery will throttle on large accounts - src/controllers/llmo/llmo-cloudfront.js:1228. Promise.allSettled fires createCdnLogDelivery for ALL distributions in parallel with no concurrency cap. Each call issues 1-3 CloudWatch Logs API calls. A customer with 50+ distributions will hit CloudWatch Logs per-account throttling (10-25 TPS), causing most calls to fail with ThrottlingException. Fix: add a concurrency limiter (e.g. p-limit(5)) or batch distributions in small groups.
  2. [Important] Missing server config returns 400 (client error) instead of 500 - src/controllers/llmo/llmo-cloudfront.js:1164. When CDN_LOG_DELIVERY_DEST_ACCOUNT_ID is not set, badRequest(...) tells the caller their input is wrong when the server is misconfigured. Fix: return internalServerError(...) instead (same applies in rescanCdnLogDelivery at line ~1218).
Non-blocking (3): minor issues and suggestions
  • nit: docs/LLMO-5566/plan.md references src/controllers/llmo/llmo.js and 16 endpoints, but the actual controller is llmo-cloudfront.js and this PR delivers 2 endpoints - the plan is misleading from day one
  • suggestion: Both new handlers diverge from the existing mutationErrorResponse pattern used by sibling endpoints in this controller (which surfaces actionable AWS errors like AccessDeniedException and ThrottlingException as 400s instead of generic 500) - consider using mutationErrorResponse in the outer catch for consistency - src/controllers/llmo/llmo-cloudfront.js:1270
  • suggestion: rescanCdnLogDelivery response includes raw outcome.reason?.message per distribution, which can expose internal AWS error details (ARNs, role names) to the caller - consider sanitizing to error category only - src/controllers/llmo/llmo-cloudfront.js:1244

@MysticatBot MysticatBot added ai-reviewed Reviewed by AI complexity:high AI-assessed PR complexity: HIGH needs-human-review AI reviewer recommends a human read before merge labels Jun 30, 2026
@MysticatBot

Copy link
Copy Markdown

Mysticat review failed: Claude CLI crashed (exit 1): stderr= stdout={"type":"result","subtype":"error_max_budget_usd","duration_ms":628407,"duration_api_ms":1527224,"is_error":true,"num_turns":38,"stop_reason":"end_turn","session_id":"7e869476-18cc-497e-b372-17f8889bac58","total_cost_usd":10.07213945,"usage":{"input_tokens":19,"cache_creation_input_tokens":224486,"c

@ABHA61 ABHA61 requested a review from MysticatBot June 30, 2026 12:26
- rescanCdnLogDelivery: cap parallel createCdnLogDelivery calls in batches of 5 to avoid
  CloudWatch Logs per-account throttling on large accounts
- missing CDN_LOG_DELIVERY_DEST_ACCOUNT_ID now returns 500 (server misconfig), not 400
- outer catch uses mutationErrorResponse so AWS AccessDenied/Throttling surface as 4xx,
  consistent with sibling CloudFront endpoints
- per-distribution rescan errors report the AWS error category (name) only, never the raw
  message (avoids leaking ARNs/role names)
- rewrite docs/LLMO-5566/plan.md to reflect the re-homed log-delivery design

Introduced by: #2680
@claudiaboldis claudiaboldis self-assigned this Jun 30, 2026
@claudiaboldis claudiaboldis requested review from MysticatBot and removed request for MysticatBot June 30, 2026 15:03

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @claudiaboldis,

Verdict: Request changes - two issues to address (concurrency race and missing integration tests).
Complexity: HIGH - large diff; API surface + new dependency + OpenAPI contract change.
Changes: Adds CDN log delivery endpoints (single-distribution and bulk rescan) using STS AssumeRole to create CloudWatch Logs delivery sources in customer accounts, with bounded concurrency and proper error categorization (13 files).
Note: Recommend a human read before merge - this change modifies the OpenAPI contract (docs/openapi/). The bot review is a complement to, not a replacement for, a human read here.

Must fix before merge

  1. [Important] Unhandled ConflictException on concurrent CreateDelivery calls - src/support/cdn-log-delivery.js:167 (details inline)
  2. [Important] Missing integration tests for both new endpoints - CLAUDE.md rule: "New or modified endpoints must include integration tests in test/it/" (details inline)
Non-blocking (2): minor issues and suggestions
  • suggestion: Delivery source name (llmo-cf-{safeOrgId}-{distributionId}) may exceed the CloudWatch Logs 60-character delivery-source name limit for long IMS org IDs - consider a defensive length check or hash-based truncation in buildDeliverySourceName - src/support/cdn-log-delivery.js:89
  • suggestion: createCdnLogDelivery creates a new CloudWatchLogsClient per invocation - during a rescan with 50+ distributions this adds unnecessary object churn; consider accepting an optional pre-built client or lifting client creation to the caller - src/support/cdn-log-delivery.js:135

Previously flagged, now resolved

  • Unbounded concurrency in rescanCdnLogDelivery now capped at CDN_LOG_RESCAN_CONCURRENCY = 5 with a batched for-loop
  • Missing server config (CDN_LOG_DELIVERY_DEST_ACCOUNT_ID) now correctly returns internalServerError() instead of badRequest()

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 2m 27s | Cost: $8.49 | Commit: a2e7638ffd078cd5730fc1711ba46b55252fbfb0
If this code review was useful, please react with 👍. Otherwise, react with 👎.

}
nextToken = page.nextToken;
} while (nextToken);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): The idempotency check has a TOCTOU window. If enableCdnLogDelivery and rescanCdnLogDelivery run concurrently for the same distribution, both can pass the GetDeliverySource existence check, both succeed at PutDeliverySource (idempotent PUT), and then the second CreateDelivery call hits a ConflictException that is unhandled - it bubbles to the outer catch and returns a 500 to the caller.

Fix: catch ConflictException (or ResourceAlreadyExistsException) from CreateDelivery and treat it as { created: false, alreadyExisted: true }, mirroring the existing idempotency contract. This is the minimal change - the check-then-act pattern stays, but the race outcome is no longer an error.

Alternatively, if concurrent use is not expected for this admin tooling, document the mutual-exclusion constraint in the plan or a code comment.

@@ -2287,4 +2307,321 @@ describe('LlmoCloudFrontController', () => {
expect(result.status).to.equal(403);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): CLAUDE.md rule: "New or modified endpoints must include integration tests in test/it/ - add shared test logic in shared/tests/, seed data in postgres/seed-data/, and a wiring file in postgres/."

Both enableCdnLogDelivery and rescanCdnLogDelivery are new endpoints with unit tests only. Integration tests that exercise the full request lifecycle (auth gating, route resolution, response shape) through the PostgreSQL harness are required by the documented convention.

Note: these endpoints call external AWS APIs via assumed roles, so the IT scope would focus on the auth/routing/validation layers (mocking the AWS calls at the boundary), not the CloudWatch Logs interactions themselves.

Addresses MysticatBot review on PR #2680:
- cdn-log-delivery: catch ConflictException/ResourceAlreadyExistsException from CreateDelivery
  (TOCTOU between concurrent enable/rescan for the same distribution) and return
  { created:false, alreadyExisted:true } with the winning delivery id, preserving idempotency
  instead of surfacing a 500
- add integration tests (test/it/postgres + shared) for the log-delivery and log-rescan
  endpoints covering the LLMO-admin access gate (404/403) and request-body validation (400),
  mirroring the llmo-cloudflare-onboarding IT pattern

Introduced by: #2680
…LMO-5566)

Close the codecov/patch gap from the TOCTOU fix: add tests for
- CreateDelivery rejecting with a non-conflict error (rethrow path)
- conflict where the raced delivery can't be found (findExistingDelivery returns undefined)

Introduced by: #2680

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @claudiaboldis,

Verdict: Approve - prior blocking findings are addressed; no new blocking issues.
Complexity: HIGH - large diff; API surface + new dependency.
Changes: Adds CDN log delivery endpoints (single-distribution and bulk rescan) using STS AssumeRole with ConflictException race handling and bounded concurrency (15 files).
Note: Recommend a human read before merge - this change modifies the OpenAPI contract (docs/openapi/). The bot review is a complement to, not a replacement for, a human read here.

Non-blocking (2): minor issues and suggestions
  • suggestion: buildDeliverySourceName can produce names exceeding the CloudWatch Logs 60-character delivery-source name limit for long IMS org IDs - consider a defensive length check or hash-based truncation - src/support/cdn-log-delivery.js:89
  • suggestion: After a ConflictException race in createCdnLogDelivery, findExistingDelivery() could return undefined (eventual consistency), resulting in deliveryId: null in the response - consider a single retry or an explicit note in the response when the ID cannot be resolved - src/support/cdn-log-delivery.js:199

Previously flagged, now resolved

  • Unhandled ConflictException on concurrent CreateDelivery calls now caught with idempotent fallback
  • Missing integration tests now added (access-control and validation layers covered in test/it/)
  • Unbounded concurrency in rescanCdnLogDelivery now capped at CDN_LOG_RESCAN_CONCURRENCY = 5
  • Missing server config now correctly returns internalServerError() instead of badRequest()

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 2m 11s | Cost: $6.05 | Commit: 6d39666c21acedd9e4d9ab324cd107ebe3afb640
If this code review was useful, please react with 👍. Otherwise, react with 👎.

…-5566)

Address MysticatBot non-blocking suggestions on PR #2680:
- buildDeliverySourceName now hash-truncates names over 60 chars (CloudWatch Logs limit)
  using a deterministic sha256 suffix, so long IMS org / resource ids stay valid and the
  name remains stable across calls (idempotency depends on it)
- document why deliveryId may be undefined after a ConflictException race (eventual
  consistency; delivery is enabled regardless, so no retry loop)

Introduced by: #2680
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI complexity:high AI-assessed PR complexity: HIGH needs-human-review AI reviewer recommends a human read before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants